## Environment Setup
- Install conda environment `conda env create -f environment.yml`
- Setup mujoco following [this link](https://github.com/openai/mujoco-py)
- Create a directory in the base package directory called "experiments" to store training, evaluation, checkpoint information
- In terminal, run `source setup_pythonpath.sh`


## Train
   [comment]: <`python run.py ppo_pytorch --env Reacher-v2 --clip_ratio 0.1 --epochs 300`>
   `python train.py [lof|rm]`

## Run Experiments
For satisfaction experiments, run `python evaluate_training.py` and `python evalulate_rm.py`
For composability experiments, run `python evaluate_satisfaction.py`

## Plot
To plot satisfaction results: `python load_and_plot_data_satisfaction.py`
To plot composability results: `python load_and_plot_data_composability.py`

## Run and visualize
Unfortunately the code is a bit rough right now. To run different tasks, you have to go into the code of `test_continuous.py` and change line 514. To run different metapolicies, you have to change the metapolicy class on line 534. It may also be necessary to give a different trained model name in line 20 (instead of model899.pt, replace it with modelxxx.pt, where xxx is the number of epochs the model was trained).
`python test_continuous.py`

`test_rm.py` is the analogous file for testing the Reward Machines baseline.

Videos are outputted to a `video/` folder.